The Shift-add Approach to String Matching
نویسنده
چکیده
S t r i n g s e a r c h i n g is a v e r y i m p o r t a n t c o m p o n e n t o f m a n y p r o b l e m s , i n c l u d i n g t ex t e d i t i n g , b i b l i o g r a p h i c r e t r i eva l , a n d s y m b o l m a n i p u l a t i o n . R e c e n t s u r v e y s o f s t r i n g s e a r c h i n g c a n be f o u n d in [4, 18]. T h e s t r i n g m a t c h i n g p r o b l e m cons i s t s o f f i n d i n g all o c c u r r ences o f a p a t t e r n o f l e n g t h m in a t ex t o f l e n g t h n. We generalize the p r o b l e m a l l o w i n g d o n ' t care s y m b o l s , t he c o m p l e m e n t o f a s y m b o l , and any f in i t e class o f s y m b o l s . We solve th i s p r o b l e m fo r o n e o r m o r e p a t t e r n s , w i t h o r w i t h o u t m i s m a t c h e s . F o r sma l l patterns the w o r s t c a s e t i m e is linear i n t he size o f the t ex t (we say t h a t a p a t t e r n is sma l l i f m is b o u n d e d b y a c o n s t a n t ) .
منابع مشابه
Improved Two-Way Bit-parallel Search
New bit-parallel algorithms for exact and approximate string matching are introduced. TSO is a two-way Shift-Or algorithm, TSA is a two-way Shift-And algorithm, and TSAdd is a two-way Shift-Add algorithm. Tuned Shift-Add is a minimalist improvement to the original Shift-Add algorithm. TSO and TSA are for exact string matching, while TSAdd and tuned Shift-Add are for approximate string matching ...
متن کاملBit-parallel string matching under Hamming distance in O(n[m/w]) worst case time
Given two strings, a pattern P of length m and a text T of length n over some alphabet Σ, we consider the string matching problem under k mismatches. The well– known Shift-Add algorithm (Baeza-Yates and Gonnet, 1992) solves the problem in O(ndm log(k)/we) worst case time, where w is the number of bits in a computer word. We present two algorithms that improve this result to O(ndm log log(k)/we)...
متن کاملBoyer-Moore Strategy to Efficient Approximate String Matching
We propose a simple but eecient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet 6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State re...
متن کاملA fast implementation of the Boyer–Moore string matching algorithm
String matching is the problem of finding all the occurrences of a pattern in a text. We present a new method to compute a combinatorial shift function (“best matching shift”) of the well-known Boyer–Moore string matching algorithm. Moreover we conduct experiments showing that the algorithm using this best matching shift is the most efficient in particular cases such as the search for patterns ...
متن کاملImproved Approach for Exact Pattern Matching
In this research we present Bidirectional exact pattern matching algorithm [20] in detail. Bidirectional (BD) exact pattern matching (EPM) introduced a new idea to compare pattern with Selected Text Window (STW) of text string by using two pointers (right and left) simultaneously in searching phase. In preprocessing phase Bidirectional EPM algorithm improved the shift decision by comparing righ...
متن کاملThe Shift-Match Number and String Matching Probabilities for Binary Sequences
Abstract We define the “shift-match number” for a binary string and we compute the probability of occurrence of a given string as a subsequence in longer strings in terms of its shift-match number. We thus prove that the string matching probabilities depend not only on the length of shorter strings, but also on the equivalence class of the shorter string determined by its shift-match number. PA...
متن کامل